Selection index theory for populations under directional and stabilizing selection

Background The purpose of a selection index is that its use to select animals for breeding maximizes the profit of a breed in future generations. The profit of a breed is in general a quantity that predicts the satisfaction of future owners with their breed, and the satisfaction of the consumers with the products that are produced by the breed. Many traits, such as conformation traits and product quality traits have intermediate optima. Traditional selection index theory applies only to directional selection and cannot achieve any further improvement once the trait means have reached their optima. A well-founded theory is needed that extends the established selection index theory to cover directional as well as stabilizing selection as limiting cases, and that can be applied to maximize the profit of a breed in both situations. Results The optimum selection index shifts the trait means towards the optima and, in the case of stabilizing selection, decreases the phenotypic variance, which causes the phenotypes to be closer to the optimum. The optimum index depends not only on the breeding values, but also on the squared breeding values, the allele contents of major quantitative trait loci (QTL), the QTL heterozygosities, the inbreeding coefficient of the animal, and the kinship of the animal with the population. Conclusion The optimum selection index drives the alleles of major QTL to fixation when the trait mean approaches the optimum because decreasing the phenotypic variance shifts the trait values closer to the optimum, which increases the profit of the breed. The index weight on the kinship coefficient balances the increased genetic gain that can be achieved in future generations by outcrossing, and the increased genetic gain that can be achieved under stabilizing selection by reducing the phenotypic variance. In a model with dominance variance, it can also account for the effect of inbreeding depression. The combining ability between potential mating partners, which predicts the total merit of their offspring, could become an important parameter for mate allocation that could be used to further shift the phenotypes towards their optimum values. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-023-00776-4.

1 Mathematical Background Definition 1. For µ ∈ R and σ > 0 let where Φ is the cumulative distribution function of the standard normal distribution.

The General Case
Definition 2. A total merit function is a function TM : R K → R, where K is the number of traits. The total merit of individual i with phenotype vector y i ∈ R K is thus denoted as TM(y i ).
Definition 3. Let Ξ denote the parameter space of the phenotypic distribution, and let f (y|ξ) denote its density. Suppose that the function ξ → f (y|ξ) is differentiable for all y ∈ R K . The profit φ(ξ) of a population with parameter ξ ∈ Ξ is given by the function φ : Ξ → R, φ(ξ) = E ξ (TM(Y )) = TM(y)f (y|ξ) dy, where the random K-vector Y contains the phenotype of an individual that is randomly chosen from the population.
Definition 4. For a population with two sexes and N selection candidates, the set C = c ∈ R N ≥0 : i:i male c i = 0.5, and i:i female is called the set of genetic contribution vectors.
Lemma 2. In order to simplify notations, we denote with s i ∈ {m, f} the sex of individual i and with s i the opposite sex. Each contribution vector c ∈ C has a representation where the vector c m with male contributions equals zero for females, and the vector c f with female contributions equals zero for males.
Proof: This is clear.

2
Definition 5. A population is called an idealized population with state space Θ, if the following conditions are satisfied: The function θ 1 : C → Θ that provides the expected state of the population in generation 1 as a function of the vector c ∈ C with genetic contributions is differentiable.
There exists a differentiable function Γ : Θ → Θ that provides the state θ n+1 = Γ(θ n ) of the population in each next generation.
There exists a differentiable function ξ : Θ → Ξ that extracts the parameter of the phenotypic distribution from the population's state.
Note that the mapping Γ would be approximate for finite, real existing populations.
Definition 6. The mapping Γ = Γ π from Definition 5 and the state θ n = θ π n of the population in generation n are determined by the breeding policy π of the population. The set of breeding policies is denoted as Π. Each breeding policy π = (T θ , tr θ ) θ∈Θ provides for a given state θ of the population a point of truncation tr θ , and a function which is called the aggregate genotype function. Thereby, U denotes the set of information sources. A vector u i ∈ U contains the unknown true genetic information on individual i that could affect selection decisions, andû i is an estimate of u i .
The aggregate genotype function T θ is used for calculating the selection index Definition 7. The expected profit of a breed after n ≥ 1 generations of selection by following breeding policy π is defined as where Y n is the random K-vector with phenotypic values of an individual that is chosen from generation n, θ 0 is the current state of the population, u is the N ×L matrix with the unknown true genetic values of the N selection candidates, u is the N × L matrix with estimated genetic values of the N selection candidates.
Definition 8. Let ζ = (ζ n ) n≥1 be a sequence with ζ n ≥ 0 and ∞ n=1 ζ n = 1. Then, ζ n is called the importance placed on the profit of the breed in generation n, and the expected future profit of the population is defined as Lemma 3. For an idealized population, the expected future profit of the population equals where c π ∈ C is the vector with genetic contributions of the selection candidates, andf Thereby, θ π n : C → Θ, θ π n (c) = Γ n−1 π (θ 1 (c)) provides the state of the population in generation n ≥ 1 as a function of the genetic contributions c i of the selection candidates. Note that θ 1 depends, in fact, also on the matrix u ∈ U N with true genetic values of the selection candidates, and on the current state θ 0 of the population, while c π depends on the matrix u ∈ U N with estimated genetic values. The parameters u andû, the vector θ 0 , and the superscript π are often omitted in the following to keep the formulas readable. Proof: 2 Definition 9. The vectorc ∈ C with default contributions is a vector for which θ 1 = θ 1 (c) is approximately the expected state of the population in the next generation.
Definition 10. For an idealized population, the aggregate genotype of individual i is defined as wherec ∈ C is the vector with default contributions,ẽ i = e i − 2c s i , and e i denotes the standard unit vector.
Note that the aggregate genotype T i measures the increase of the breed's future profit that results from using individual i for breeding.
Lemma 4. For an idealized population, the aggregate genotype satisfies where ∇f π |c is the gradient off π atc.

Proof:
Functionf π is differentiable, so the limit does not change, whenf π is replaced by the first order Taylor approximatioñ f π (c + λẽ i ) ≈f π (c) + (λẽ i ) ∇f π |c. Thus, Theorem 1. For an idealized population, the aggregate genotype satisfies is called the vector with true genetic values of individual i, and is called the vector with economic weights. Thereby, the derivative of a vector by a vector is the Jacobi matrix with partial derivatives, ξ n (θ 1 ) = ξ(Γ n−1 π (θ 1 )), ξ n = ξ n (θ 1 ), andθ 1 = θ 1 (c). Proof: 2 Theorem 2. Let v(θ 1 ) be the vector with economic weights, let u i be the Lvector with true genetic values of animal i and letû i be an estimate thereof. Suppose that either animal i is chosen at random from a specific class of animals, or that u i andû i are random for other reasons. Let Then, the function

Proof:
The proof is analogous to the corresponding proof in traditional index selection.
The function has its maximum at the same position as functioñ We haver Thus, Consequently,

The equation holds for
whereby the maximum is obtained for but also for λb i with λ > 0.

2.Combination of Breeding Objectives
Definition 11. We call a total merit function "general", if the function combines several partial merit functions into a single one. That is, a total merit function is general if there is a set O of breeding objectives, and a partial merit function PM o : R K → R for each breeding objective o, such that for all y ∈ R K .
Lemma 5. For a general total merit function, the profit function is is called the breed's profit with respect to breeding objective o, and Y is the Kvector with phenotypic values of an individual that is randomly from a population with parameter ξ. Proof: Theorem 3. For an idealized population and a general total merit function, the vector with economic weights satisfies is called the vector with economic weights for breeding objective o. Thereby, the derivative of a vector by a vector is the Jacobi matrix with partial derivatives, ξ n (θ 1 ) = ξ(Γ n−1 π (θ 1 )),ξ n = ξ n (θ 1 ), andθ 1 = θ 1 (c). Proof: Definition 12. We call a general total merit function "distance-based" and "piecewise linear", if all partial merit functions have the representation where ω ok is the weight of trait k, Opt ok is the optimum of trait k for breeding objective o, and τ max is an arbitrary constant.
Definition 13. The parameter space Ξ for a piecewise linear total merit function and normally distributed traits consists of all tuples ξ = (µ, σ P ) with µ ∈ R K and σ P ∈ R K >0 .
The parameter space could alternatively be defined as consisting of of all tuples ξ = (µ, Σ P ), where Σ P is the phenotypic covariance matrix. The next lemma shows that this is not needed because the profit of the population does not depend on the phenotypic correlations.
Lemma 6. Suppose that the total merit function is distance-based and piecewise linear, and that all traits are normally distributed. Then, the profit of the population for breeding objective o is The last equality holds because |Y k − Opt ok | has a folded normal distribution.

2
Lemma 7. Suppose that the total merit function is distance-based and piecewise linear, and that all traits are normally distributed. Then, the profit of the population for breeding objective o has partial derivatives with ξ = (µ, σ P ), and Proof: Second equation: Theorem 4. Suppose that the total merit function is distance-based and piecewise linear, and that all traits are normally distributed. Then, the vector with economic weights is  Note that f i could in practice be well approximated by the average kinship of individual i with the population.

Combining Ability
Definition 15. The true combining ability between individuals i and j is defined as where the random K-vector Y Off(i,j) denotes the vector with trait measurements of a randomly chosen offspring of individuals i and j.
Theorem 5. Suppose that the total merit function is distance-based and piecewise linear, and that all traits are normally distributed. Then, the true combining ability between individuals i and j is where the partial combining ability of individuals i and j with respect to breeding objective o equals For an additive genetic model, is the mean, and Ek is the variance of the trait value of a randomly chosen offspring, σ 2 Ek is the environmental variance of trait k, µ k is the mean of trait k in the base population, and MV ik is the variance of the Mendelian sampling term for trait k that is submitted by animal i to its offspring.

Proof:
Let Y Off(i,j) denote the random vector with trait measurements of a randomly chosen offspring of individuals i and j. Then, where µ ijk − Opt ok is the mean and σ 2 ijk is the variance of the normally distributed random variable ∆ ijok = Y Off(i,j)k − Opt ok . The mean and variance can be calculated as follows. The vector Y Off(i,j) has the representation where E k is the environmental effect on the phenotype of the offspring. Consequently, the random variable ∆ ijok has mean µ ijk − Opt ok = µ k + TBV ik + TBV jk 2 − Opt ok , and variance

Simplistic Genetic Model
Lemma 8. Suppose that the total merit function is distance-based and piecewise linear, the traits are normally distributed, and 1) Phenotypic and genetic variances and covariances are constants, 2) The trait optima are several phenotypic standard deviations away from the trait means, 3) The same selection index will be used in all future generations until ζ n approaches 0.
Then, the partial aggregate genotype for breeding objective o is where v ok = ±ω ok .

Proof:
Assumption 1) implies that ∂σ 2 Pnk ∂c i (c) = 0. Consequently, all information on the selection candidates that can affect their aggregate genotypes is given by and ∂µ nk ∂c i (c). As the same selection index will be used in all relevant future generations, we have µ n = µ 1 + (n − 1)∆µ = µ 0 + TBV c + (n − 1)∆µ where the change ∆µ of the population mean from one generation to the next is a constant. Consequently, Hence, u i = TBV i is the K-vector with true breeding values of the individual. The corresponding population parameter θ n with ∂θ 1 ∂c i = TBV i is θ n = µ n . That is, the state of a population is completely described by the vector with trait means. As the trait optima are several phenotypic standard deviations away from the trait means, we have approximately χ k m (ξ n ) = ±1 and χ k v (ξ n ) = 0. Moreover, where ( * ) holds because µ n and µ 1 differ only by a constant. Consequently, The vector with economic weights is thus is the variance of the parent average, and is the variance of the Mendelian sampling terms that are submitted by the parents to the offspring. In a random mating population, Var (TBV Ik ) + 1 4 Var (TBV Jk ) .

Proof:
The breeding value of offspring O has a representation TBV Ok = TBV Ik + TBV Jk 2 + MT Ik + MT Jk . The assumption of random mating ensures that Cov(TBV Ik , TBV Jk ) = 0, so

2
Theorem 6. Suppose that the population is random mating, and that the environmental variance is constant. Let µ 1k denote the mean, and σ 2 P1k the phenotypic variance of trait k in Generation 1. Let σ 2 PA1k and σ 2 MT1k be as in Lemma 9 . Then, Proof: Moreover, Let animal I be of the same sex as animal i. Thus, Let I m be the set of individuals that have the same sex as individuals i and I. Since Thus, Theorem 7. Let σ 2 MT2k be the variance of the Mendelian sampling terms that are received by a randomly chosen individual from generation 2. Suppose that animal i has the same genetic contribution to generation 2 as to generation 1, and that the genetic contribution of an animal is independent from its Mendelian sampling variance. Then,

Proof:
The equation is shown for the case that individual i is a male. The genetic contributions of the animals to Generation 1 are fixed parameters, so for the ease of notation, the animals from Generation 1 can assumed to be ordered such that the set O i of offspring of animal i is a fixed set. The mating partners are assigned at random. Let I 2 and J 2 be the sire and dam of a randomly chosen individual from Generation 2, and let G denote the genomes of the individuals from Generation 2. We have where I 1 is the set of individuals, I 1m is the set of males, and I 1f is the set of females from Generation 1. It is assumed that where c mp i = c i is the genetic contribution that is attributed to the mating partners. Thus, Consequently, where TBV ik is called the polygenic breeding value, a qk is the true additive effect of QTL q on trait k, and p 0q is the frequency of the alternative allele of QTL q in the current generation. Furthermore, the Mendelian sampling term that is transmitted by individual i has the representation where it is assumed that the random allele content U iq ∈ {0, 1} of QTL q for the haplotype that was transmitted by individual i, and the part MT ik of the Mendelian sampling term that is due to unknown QTLs satisfy the following conditions: The random variables U iq and MT ik are independent, MT ik is a random variable with mean 0 and variance 1−F i 4σ 2 Ak , whereσ 2 Ak is the polygenic variance of trait k in a non-inbred, unselected and random-mating population.
Theorem 8. For the QTL-based additive model with a polygenic term, the Mendelian sampling variance MV ik of individual i for trait k is Proof: The Mendelian sampling term of individual i for trait k has the representation The independence of the random variables Z iq and MT ik implies that 2 Theorem 9. For the QTL-based additive model with a polygenic term, the expected Mendelian sampling variance of an offspring o i of individual i is where the expected heterozygosity of an offspring of individual i at QTL q is Thereby, p s i 1q (c) is the frequency of the alternative allele of QTL q in the haplotypes from the next generation that are received from parents of the opposite sex. Proof: The inbreeding coefficient of an individual is the kinship of their parents, so where J is a randomly chosen mating partner, and Op is the set of individuals with the opposite sex as individual i. Furthermore, the probability that the offspring is heterozygous at QTL q is where the first summand is the probability that individual i submits allele 1, and the mating partner submits allele 0, and the second summand is the probability that individual i submits allele 0, and the mating partner submits allele 1. where M is the set of males, and F is the set of females. As the inbreeding coefficient of an individual equals the kinship of the parents, this equation shows that F 1 (c) is indeed the average inbreeding coefficient in the next generation. Furthermore, it follows that